CustomExtractFeatures ======================= 对输入字符串序列逐条进行特征提取,生成对应的离散标签(label)和权重(weight)。 该算子主要用于文本特征工程,包含以下逻辑: - 对黑名单字符串进行过滤 - 对非黑名单字符串计算哈希特征 - 根据字符串中空格数量生成权重 .. math:: \text{label}_i = \begin{cases} 0, & \text{if } s_i \in \text{blacklist} \\ \operatorname{hash}(s_i) \bmod K, & \text{otherwise} \end{cases} .. math:: \text{weight}_i = \begin{cases} 0, & \text{if } s_i \in \text{blacklist} \\ \text{space\_count}(s_i) + 1, & \text{otherwise} \end{cases} 其中: - :math:`s_i` 表示第 :math:`i` 条输入字符串 - :math:`K` 为固定的哈希空间大小(默认 :math:`10^6`) - ``blacklist`` = {``""``, ``""``, ``" "``} 输入: - **string_pointers** - 指向字符串首地址的指针数组。 - **string_lengths** - 各字符串对应的长度数组。 - **num_strings** - 输入字符串的数量。 - **core_mask** - 核掩码。 输出: - **output_labels** - 输出标签数组地址(int32)。 - **output_weights** - 输出权重数组地址。 支持平台: ``FT78NE`` ``MT7004`` .. note:: - FT78NE 支持的数据类型: - fp32, fp64 - int8, int16, int32 - cplx64, cplx128 - MT7004 支持的数据类型: - fp16, fp32 - int16, int32 - cplx64 - 当 ``num_strings == 0`` 时,输出的 ``label`` 和 ``weight`` 被置为 0 - 黑名单字符串不会参与哈希计算 **共享存储版本:** .. c:function:: void fp_extract_features_s(char** string_pointers, int* string_lengths, int num_strings, int* output_labels, float* output_weights, int core_mask) .. c:function:: void dp_extract_features_s(char** string_pointers, int* string_lengths, int num_strings, int* output_labels, double* output_weights, int core_mask) .. c:function:: void i8_extract_features_s(char** string_pointers, int* string_lengths, int num_strings, int* output_labels, int8_t* output_weights, int core_mask) .. c:function:: void i16_extract_features_s(char** string_pointers, int* string_lengths, int num_strings, int* output_labels, int16_t* output_weights, int core_mask) .. c:function:: void i32_extract_features_s(char** string_pointers, int* string_lengths, int num_strings, int* output_labels, int32_t* output_weights, int core_mask) .. c:function:: void c64_extract_features_s(char** string_pointers, int* string_lengths, int num_strings, int* output_labels, float* output_weights, int core_mask) .. c:function:: void c128_extract_features_s(char** string_pointers, int* string_lengths, int num_strings, int* output_labels, double* output_weights, int core_mask) **C调用示例:** .. code-block:: c :linenos: :emphasize-lines: 15 // FT78NE 多核示例 #include #include int main(int argc, char* argv[]) { char* strings[] = {"hello world", "", "test data"}; int lengths[] = {11, 3, 9}; int num_strings = 3; int *labels = (int *)0xA0000000; float *weights = (float *)0xB0000000; int core_mask = 0xff; fp_extract_features_s(strings, lengths, num_strings, labels, weights, core_mask); return 0; } **私有存储版本:** .. c:function:: void fp_extract_features_p(char** string_pointers, int* string_lengths, int num_strings, int* output_labels, float* output_weights) .. c:function:: void dp_extract_features_p(char** string_pointers, int* string_lengths, int num_strings, int* output_labels, double* output_weights) .. c:function:: void i8_extract_features_p(char** string_pointers, int* string_lengths, int num_strings, int* output_labels, int8_t* output_weights) .. c:function:: void i16_extract_features_p(char** string_pointers, int* string_lengths, int num_strings, int* output_labels, int16_t* output_weights) .. c:function:: void i32_extract_features_p(char** string_pointers, int* string_lengths, int num_strings, int* output_labels, int32_t* output_weights) .. c:function:: void c64_extract_features_p(char** string_pointers, int* string_lengths, int num_strings, int* output_labels, float* output_weights) .. c:function:: void c128_extract_features_p(char** string_pointers, int* string_lengths, int num_strings, int* output_labels, double* output_weights) **C调用示例:** .. code-block:: c :linenos: :emphasize-lines: 13 // MT7004 单核示例 #include #include int main(int argc, char* argv[]) { char* strings[] = {"example text"}; int lengths[] = {12}; int num_strings = 1; int *labels = (int *)0x10000000; float *weights = (float *)0x11000000; fp_extract_features_p(strings, lengths, num_strings, labels, weights); return 0; }